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1  Introduction 


Our  research  funded  under  ONR  contract  N00014-87  K-0796  can  be  broadly  divided  into  five  areas: 

1.  design  and  analysis  of  deadline  based  scheduling  policies, 

2.  design  and  evaluation  of  high  performance  and  fault  tolerant  disk  architectures  for  real-time 
systems, 

3.  design  and  evaluation  of  scheduling  policies  for  real-time  tasks  with  incremental-value-with- 
incresised-execution-  time  characteristics , 

4.  reliability  and  testing  of  real-time  systems, 

5.  scheduling  for  real-time  parallel  processing  systems. 

These  topics  will  be  the  subject  of  the  remainder  of  the  technical  section.  Additional  details  of  our 
work  can  be  found  in  the  cited  technical  papers  and  reports. 


2  Deadline  Based  Scheduling 

In  soft  real-time  computer  and  communication  systems,  temporal  constraints  are  placed  on  the 
behavior  of  the  jobs  (e.g.,  processes  or  messages)  within  these  systems.  Typically,  these  constraints 
require  that  these  jobs  initiate  or  complete  some  task  (e.g.,  a  process’  computation  or  a  message’s 
transmission)  within  some  deadline.  In  such  soft  real-time  systems,  the  performance  metric  of 
interest  is  no  longer  one  of  the  traditional  measures,  such  as  average  delay  or  throughput,  but 
rather,  the  fraction  of  jobs  which  sffe  not  able  to  meet  their  specified  time  constrsdnts  (i.e.,  the 
fraction  of  jobs  that  are  lost.) 

One  of  our  primary  goals  during  this  last  year  was  to  understand  the  behavior  of  two  scheduling 
policies  in  such  a  system  with  deadlines:  the  minimum  laxity  policy  (ML)  and  earliest  deadline 
policy  (ED).  In  the  ML  policy,  deadlines  are  to  the  beginning  of  service;  in  the  ED  policy,  deadlines 
are  to  the  end  of  service.  Both  policies  schedule  that  job  whose  deadline  is  closest  to  expiring.  In 
the  following  subsections,  we  overview  the  results  that  we  have  obtained  regarding  the  optimality 
(in  terms  of  minimising  loss)  of  these  policies  and  then  describe  two  approaches  for  modeling  their 
performance. 


2.1  Optimality  of  the  ML  and  ED  Policies 

We  have  been  able  to  establish  the  following  optimality  property  of  the  ML  and  ED  policies  on 
a  multiprocessor  executing  a  stream  of  jobs  with  arbitrary  arrival  times  and  deadlines.  Assuming 
that  job  service  times  are  exponentially  distributed,  we  established  that  over  the  entire  class  of 
non- idling  scheduling  policies,  both  the  ML  and  ED  policies  maximize  the  fraction  of  jobs  that 
meet  their  time  constraints.  _ 
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These  results  hold  both  for  the  case  that  an  unlimited  number  of  jobs  can  reside  in  the  system 
and  for  the  case  that  a  maximum  of  B  jobs  can  reside  in  the  system  at  any  one  time.  In  such  systems, 
the  “scheduling”  policy  must  also  determine  which  job  should  be  removed  from  the  system  whenever 
it  is  full  and  a  new  job  arrives.  We  have  shown  that  the  policy  that  removes  the  job  closest  to 
its  deadline,  coupled  with  the  ML  or  ED  policy  for  scheduling  jobs,  maximizes  the  fraction  of  jobs 
that  make  their  deadlines  under  the  same  assumptions  as  above.  Details  of  these  results  may  be 
found  in  [26].  We  note  that  these  results  are  particulzu’ly  powerful  as  they  establish  the  optimality 
of  two  specific  scheduling  disciplines  over  a  large  class  of  possible  real-time  scheduling  disciplines. 

We  have  also  obtained  similar  results  for  the  case  that  deadlines  are  not  precisely  known  but  a 
“stochastic  ordering”  exists  among  the  deadlines  of  all  jobs  in  the  system.  Details  of  these  results 
may  be  found  in  [26], 

We  have  also  treated  systems  in  which  customers  are  not  removed  when  they  miss  a  deadline. 
See  [7]  for  details. 

2.2  Bounds  on  the  Performance  of  the  ML  and  ED  Policies 

In  addition  to  studying  the  optimality  properties  of  the  ML  and  ED  policies,  we  also  considered  two 
approaches  towards  evaluating  their  performance.  In  our  first  approach,  we  developed  a  Markovian 
model  that  describes  the  behavior  of  ML  on  a  multiprocessor  under  the  assumptions  of  Poisson 
arrivals  and  exponentially  distributed  service  times  and  deadlines.  A  similar  model  was  developed 
for  the  ED  policy  for  a  single  processor  system  under  identical  assumptions.  Unfortunately,  an 
exact  analysis  of  this  model  is  computationally  intractable  (from  a  practical  standpoint).  However, 
we  were  able  to  develop  tractable  models  which  produce  upper  and  lower  bounds  on  the  fraction  of 
jobs  lost.  These  bounds  can  be  made  arbitrarily  tight  at  the  cost  of  additional  computation.  The 
results  of  this  analysis  can  be  found  in  [12]. 

The  Markov  model  imderlying  these  boimds  is  based  on  a  new  binary  simulation  of  the  ML 
and  ED  policies.  This  simulation  can  be  found  in  [11]  and  can  be  used  to  develop  models  for  ML 
and  ED  scheduling  for  the  cases  in  which  the  service  times  are  generally  distributed,  or  the  arrival 
times  are  generally  distributed. 

2.3  Exact  Analysis  of  the  ML  Policy 

The  performance  analysis  described  in  the  previous  section  was  approximate  and  required  exponen¬ 
tial  assumptions  for  the  interarrival  times,  service  time,  and  deadline  distributions.  We  have  also 
developed  a  computational  algorithm  for  exactly  computing  customer  loss  under  less  restrictive  as¬ 
sumptions,  provided  the  system  can  be  modeled  as  a  discrete  time  queueing  system.  Such  a  model 
would  be  appropriate  in  communication  networks  and  computer  systems  in  which  event  timings 
occur  in  discrete  units  of  time  (e.g.,  any  time- division- multiplexed  communication  link/network  or 
computer  system  in  which  job  execution  times  and  interarrival  times  are  multiples  of  some  minimum 
time  quantum). 

Specifically,  we  considered  a  discrete-time  queueing  system  in  which  the  deadline  associated 
with  each  customer  (the  amount  of  time  from  a  customer’s  arrival  until  the  time  at  which  it  must 
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begin  service)  is  bounded  by  some  maximum  possible  value,  M  time  units..  Customers  in  the  queue 
are  scheduled  according  to  the  ML  policy,  and  a  customer  whose  deadline  expires  is  considered  lost 
and  is  removed  from  the  queue  without  receiving  service.  We  have  been  able  to  exactly  analyze 
the  case  of  geometrically  distributed  service  times  and  a  bulk  arrival  process  in  which  the  number 
of  customers  arriving  in  a  slot  with  a  deadline  of  t  slots  is  also  geometrically  distributed  (for  each 
t,  1  <  X  <  M);  we  have  also  demonstrated  how  this  model  can  be  extended  to  include  generally 
distributed  service  times  and  laxities  as  well.  The  main  result  of  this  work  has  thus  been  the 
development  of  a  numerical  algorithm  which  exactly  computes  customer  loss  for  this  queueing 
system  with  a  time  complexity  of  0{M*).  Details  of  this  work  can  be  found  in  [17]. 

2.4  Approximate  ML  and  ED  policies 

One  potential  drawback  of  ML  scheduling  is  that  the  identity  of  the  job  with  the  closest  deadline 
(minimum  laxity)  must  be  determined  at  each  scheduling  point  -  a  potentially  expensive  run-time 
cost,  especially  when  the  number  of  queued  jobs  is  large.  For  example,  if  jobs  are  maintained  in  a 
list  structure,  finding  the  minimum  laxity  job  in  a  non-laxity-ordered  list  or  maintaining  a  sorted 
list  according  to  laxity  are  both  0{n),  when  there  are  n  jobs  queued;  if  a  dictionary-like  structure 
is  used  to  queue  jobs,  the  time  to  maintain  the  data  structure  is  0(/n(n)). 

We  developed  a  policy  ML(n)  [12]  that  approximates  the  behavior  of  ML  in  the  following  way. 
This  policy  divides  the  overall  queue  into  two  queues,  Q1  and  Q2,  where  (?1  can  hold  at  most  n 
jobs.  If  the  total  number  of  jobs  waiting  for  service  is  less  than  or  equal  to  n,  they  are  all  held  in 
Ql.  Jobs  in  Q1  are  scheduled  according  to  ML.  However,  if  the  number  of  jobs  exceeds  n,  then 
an  arriving  job  is  unconditionally  placed  into  Q2.  At  a  service  completion  instant,  the  job  with 
minimum  laxity  among  all  jobs  queued  in  <J1  is  scheduled  for  service.  When  the  scheduler  moves 
a  job  from  Ql  to  the  server,  it  also  moves  a  job  from  Q2  to  Ql,  selecting  the  job  to  enter  (?1 
on  M  FCFS  basis.  In  summary  then,  Ql  is  an  ML  queue  of  size  n  and  Q2  is  an  FIFO  queue  of 
unbounded  size  and  Q2  feeds  Ql.  Note  that  ML(1)  is  same  as  FCFS  and  ML(oo)  is  equivalent  to 
exact  ML. 

In  [20]  we  compared  this  policy  with  a  variant,  first  presented  in  [29]  which  reverses  the  positions 
of  Ql  and  Q2,  i.e.,  the  ML  portion  of  the  queue,  Ql,  feeds  the  FIFO  portion  of  the  queue,  Q2.  We 
showed  that  these  two  seemingly  dissimilar  policies  always  make  the  same  scheduling  decisions  at 
the  same  time. 

In  [10],  we  consider  four  variants  of  the  ML(n)  policy  that  approximate  the  behavior  of  ML 
scheduling  and  continue  to  enjoy  the  advantage  of  having  a  nm-time  cost  which  is  independent  of 
the  number  of  queued  customers.  Our  simulation  results  show  that  the  best  of  the  four  policies 
provides  20-25%  improvement  over  ML(n)  and  performs  within  5%  of  the  exact  ML  policy  over  a 
wide  range  of  traffic  loads  and  laxity  distributions.  This  policy  differs  from  ML(n)  in  the  following 
manner.  At  arrival,  if  there  are  n  or  more  jobs  in  the  system,  the  laxity  of  the  new  arrival  is 
compared  to  the  laxity  of  the  job  with  the  maximum  laxity  among  the  n  jobs  in  Ql.  If  the  laxity 
of  the  new  arrival  is  greater,  the  new  arrival  is  placed  at  the  end  of  Q2;  otherwise,  the  new  arrival 
is  placed  in  the  position  of  the  job  with  maximum  laxity  among  the  jobs  in  Ql,  and  the  job  with 
maximum  laxity  in  Ql  is  placed  at  the  end  of  Q2. 
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Details  of  the  other  policies  and  their  performance  evaluation  can  be  found  in  [10]. 


3  High  Performance  and  Fault  Tolerant  Real-Time  I/O  Systems 

Throughout  our  contract,  we  have  concerned  ourselves  with  the  development  of  real-time  fault 
tolerant  high  performance  I/O  systems. 

3.1  Scheduling  policies  for  real-time  disks 

We  developed  and  evaluated  the  performance  of  two  new  disk  scheduling  edgorithms  for  real-time 
systems.  These  algorithms,  called  SSEDO(for  Shortest  Seek  and  Earliest  Deadline  by  Ordering) 
and  SSEDV(for  Shortest  Seek  and  Earliest  Deadline  by  Value),  combine  deadline  information  and 
disk  service  time  information  in  different  ways.  The  basic  idea  behind  these  new  algorithms  is 
to  give  the  disk  I/O  request  with  the  earliest  deadline  a  high  priority;  but  if  a  request  with  a 
larger  deadline  is  “very”  close  to  the  current  disk  arm  position,  then  it  may  be  assigned  the  highest 
priority.  The  performance  of  SSEDO  and  SSEDV  algorithms  is  compared  with  three  other  proposed 
real-time  disk  scheduling  algorithms,  ED,  P-SCAN,  and  FD-SCAN,  as  well  as  four  conventional 
algorithms,  SSTF,  SCAN,  C-SCAN,  and  FCFS.  An  important  aspect  of  the  performance  study  is 
that  the  evaluation  is  not  done  in  isolation  with  respect  to  the  disk,  but  as  part  of  an  integrated 
collection  of  protocols  necessary  to  support  a  real-time  transaction  system.  The  transaction  system 
model  was  validated  on  an  actual  real-time  transaction  system  testbed,  czdled  RT-CARAT.  The 
performance  measures  of  interest  are  the  transaction  loss  probability  and  the  average  response 
time  for  the  committed  transactions  imder  different  I/O  scheduling  algorithms.  The  results  show 
that  SSEDV  outperforms  SSEDO;  that  both  of  ♦hese  new  algorithms  can  improve  performance 
of  up  to  38%  over  previously-known  real-time  disk  scheduling  algorithms;  luid  that  all  of  these 
real-time  scheduling  algorithms  are  significantly  better  than  non-real-time  algorithms  in  the  sense 
of  minimizing  the  transaction  loss  ratio.  Details  of  this  study  can  be  found  in  [3]. 

Although  the  performance  evaluation  was  done  in  the  context  of  a  transaction  processing  system, 
the  policies  can  be  used  in  any  real-time  setting  and  the  relative  rankings  of  all  of  the  policies  studied 
should  not  be  effected. 

3.2  Fault  tolerant  disk  systems 

We  developed  and  evaluated  the  performance  of  a  number  of  scheduling  algorithms  for  a  pair  of 
disks  where  copies  of  each  data  item  are  maintained  on  each  disk.  We  considered  two  classes  of 
policies,  i)  centralized  queue  policies  (CQP’s)  and  ii)  distributed  queue  policies  (DQP’s).  A  CQP 
maintains  a  central  queue  of  all  requests  that  require  servicing  by  the  mirrored  disks.  In  addition, 
a  second,  auxiliary  queue  may  form  from  time  to  time  at  the  disk  that  lags  behind.  When  a  disk 
becomes  available,  a  request  is  scheduled  to  it  from  the  central  queue  according  to  some  policy.  If 
the  request  is  a  read,  then  it  is  served  by  this  disk.  If  it  is  sui  update,  then  in  addition  to  being 
served  by  the  available  disk,  it  is  also  entered  into  the  auxiliary  queue  associated  with  the  second 
disk.  Last,  requests  are  scheduled  from  the  auxiliary  disk  according  to  some  policy. 
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A  DQP  maintains  a  separate  queue  at  each  disk.  A  read  request  is  assigned  to  one  of  these 
disks  according  to  some  routing  rule,  whereas  an  update  generates  write  requests  at  both  disks. 
Requests  are  scheduled  at  each  queue  according  to  some  scheduling  policy.  Last,  whenever  a  read 
arrives  to  find  both  disks  idle,  it  is  routed  to  the  one  that  will  provide  the  shortest  seek  time. 

Our  studies  have  focussed  on  the  choice  of  scheduling  policies  at  the  queues  and  the  choice  of 
routing  policy  in  the  class  of  DQP’s.  We  have  concluded  that  the  best  performance  (i.e.,  smallest 
fraction  of  jobs  missing  their  deadlines)  is  obtained  with  a  DQP  which  uses  a  join  the  shortest  queue 
routing  policy  for  reads  and  the  SSEDO  (shortest  seek  and  earliest  deadline  by  ordering)  policy  at 
each  of  the  queues.  Details  of  this  work  can  be  found  in  [4,  24];  see  also  [3]  for  details  on  SSEDO. 

3.3  Disk  arrays 

Our  work  in  this  area  has  focussed  on  developing  and  evaluating  the  performance  of  different  data 
layout  schemes  and  scheduling  policies  for  two  proposed  disk  array  architectures,  the  mirrored  array 
and  rotated  parity  array. 

In  both  cases,  the  system  consists  of  N  disks.  The  mirrored  array  ensures  that  there  are  two 
copies  of  each  file,  whereas  the  rotated  parity  array  provides  a  parity  block  for  fault  tolerance  for 
every  N  -  1  data  blocks. 

In  the  case  of  the  mirrored  array,  we  proposed  several  data  allocation  schemes  and  schedul¬ 
ing  policies  We  compared  their  performance  in  the  case  that  all  disks  are  operational  (normal 
mode)  under  two  workloads:  ij  applications  in  which  I/O  requests  are  for  small  amounts  of  data 
(e.g.,  transaction  processing,  workstation),  and  ii)  applications  in  which  I/O  requests  are  for  lau-ge 
amoimts  of  data  (e.g.,  supercomputing,  image  processing).  The  main  results  is  that  in  the  normal 
mode  of  operation,  a  newly-proposed  group-rotate  declustering  allocation,  coupled  with  a  policy 
that  assigns  read  requests  to  the  disk  containing  the  data  with  the  shortest  queue,  provides  the 
lowest  mean  response  time  of  all  of  the  combinations  that  we  considered.  This  is  true  for  both 
types  of  applications  described  above. 

In  the  case  of  the  rotated  parity  array,  we  compared  the  performance  of  the  traditional  RAID 
5  layout  where  files  are  interleaved  across  the  disks,  with  the  parity  striping  layout,  where  each 
file  is  stored  on  a  single  disk.  We  also  studied  two  synchronized  I/O  scheduling  policies  suitable 
for  both  layouts,  which  take  care  of  this  problem.  These  two  policies  provide  practical  solutions 
to  the  problem  of  providing  write  synchronization  in  rotated  parity  arrays.  We  provided  accurate 
mathematical  models  for  estimating  the  mean  I/O  response  times  and  the  maximum  throughput 
of  both  layouts,  coupled  with  these  two  synchronized  scheduling  policies.  Using  these  models, 
we  compared  the  performance  of  the  two  layouts  with  each  other.  The  results  show  that  the 
performance  of  RAID  5  (with  relative  small  striping  imit  of  4K  bytes)  is  sensitive  to  the  increase 
of  mean  request  size  but  not  to  the  skew  in  the  access  pattern.  On  the  other  hand,  the  parity 
striping  layout  is  sensitive  to  skew  in  the  access  pattern.  Therefore,  depending  on  applications, 
RAID  5  outperforms  parity  striping  in  some  cases,  but  is  outperformed  by  parity  striping  in  other 
cases.  We  identify  workloads  for  which  each  layout  provides  the  best  performance.  This  allows  the 
designer  the  ability  to  choose  between  them  for  their  applications. 

Last,  we  compared  the  two  arrays  with  each  other  and  observed  that  the  mirrored  array  arclii- 
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lecture  significantly  outperforms  the  rotated  parity  array  surchitecture  when  applications  generate 
I/O  requests  for  small  amounts  of  data.  This  is  true  for  the  case  that  both  architectmes  have  the 
same  number  of  disks  as  well  as  when  they  have  the  same  storage  capacity.  In  the  case  of  appli¬ 
cations  that  generate  I/O  requests  for  large  amounts  of  data,  the  results  are  not  as  clear.  RAID 
5  performs  better  when  most  requests  are  very  large,  most  requests  are  writes,  and  most  writes 
perform  full  stripe  writes. 

Similar  results  have  been  obtained  in  the  case  of  soft  real-time  workloads.  Details  of  this  and 
the  above  work  can  be  found  in  [5,  6,  25]. 


4  Scheduling  Real-Time  Tasks  with  Incremental  Reward  Char¬ 
acteristics 

Many  real-time  systems  can  be  modeled  by  a  single  server  system  in  which  jobs  arrive  (according 
to  a  certain  distribution)  and  remain  in  the  system  for  a  certain  amount  of  time  before  departing. 
The  longer  you  serve  a  job  (while  it  is  present  in  the  system)  the  more  profit  you  make  on  the  job. 
In  typical  problems  (such  as  in  Artificial  Intelligence)  the  profit  curves  are  concave  [2]. 

In  this  work,  we  considered  the  problem  of  scheduling  jobs  whose  arrivals  are  described  by  an 
arbitrary  stochastic  process.  We  assume  that  job  i  has  a  lifetime  of  Tj  and  a  profit  curve  /i(ii) 
which  is  an  increasing  concave  function  of  the  received  service. 

The  objective  is  to  schedule  the  jobs  to  msudmize  the  profit  per  unit  time.  We  refer  to  the 
lifetime  Tj  of  job  t  as  its  initial  laxity.  At  any  time  t,  let  a  job  be  present  in  the  system  with 
deadline  d  (d  >  t).  The  remaining  time  until  the  job  leaves  the  system  is  referred  to  as  the  laxity 
(as  opposed  to  initial  laxity)  of  the  job.  Thus,  at  time  t  the  lautity  of  a  job  with  deadline  d  is 
l  =  d-t. 

We  have  formulated  and  solved  a  static  optimization  problem  where  we  assume  the  presence  at 
time  t  =  0  of  N  jobs  with  distinct  deadlines  and  profit  functions  for  the  case  of  no  future  arrivals.  In 
the  case  of  arbitrary  concave  profit  functions,  the  optimal  schedule  can  be  obtained  with  an  amount 
of  computation  that  is  0{N^).  If  we  assume  that  the  profit  curve  is  of  the  form  fi{x)  =  1  - 
then  the  complexity  is  reduced  to  0{N  log  N).  We  have  developed  an  algorithm  that  accounts  for 
arrivals,  by  executing  this  static  policy  at  each  arrival  epoch  and  using  the  associated  schedule  imtil 
the  following  arrival.  As  this  heuristic  does  not  produce  a  unique  schedule  to  be  used  during  this 
period,  we  have  considered  the  following  heuristics. 

1.  Shortest  Service  -  In  any  given  interval,  service  the  job  with  the  least  service  so  far. 

2.  Earliest  Deadline  (ED)  -  In  any  given  interval,  service  the  job  with  the  closest  deadline. 

3.  R.andom  Selection  -  In  any  given  interval,  service  a  job  randomly. 

We  have  compared  these  different  heuristics  to  each  other,  to  simple  first-come-first-serve  and 
last-come-first-serve  policies,  and  to  simple  upper  bounds  in  the  case  that  two  different  classes  of 
customers  are  being  served.  Here,  customers  from  different  classes  differ  in  their  profit  function  and 


7 


laxity  distributions.  We  find  that  the  simple  ED  policy,  in  combination  with  the  static  optimization 
algorithm,  performs  the  best  and  achieves  a  performance  close  to  the  unachievable  upper  bound  in 
most  cases. 

These  algorithms  are  likely  to  be  of  importance  in  many  real-time  applications  which  involve 
successive  approximations  or  search  procedures  [2,  8,  9,  19]. 


5  Reliability  and  Testing 

5.1  Reliability  Modeling  of  Real-Time  Systems  with  Transient  and  Correlated 
Failures 

Real-time  systems  can  fail  not  just  because  of  spontaneously-occurring  failures,  but  also  because 
of  events,  such  as  a  burst  of  electromagnetic  or  elementary-particle  radiation.  Such  environmental 
upsets  can  cause  both  transient  and  permanent  failure.  Since  the  entire  system  is  bathed  in  the 
same  environment,  such  failures  can  be  correlated.  However,  despite  the  occurrence  of  correlated 
failures,  contemporary  reliability  models  have  largely  ignored  them,  assuming  instead  that  failures 
are  independent. 

Our  work  on  developing  the  mathematical  underpinnings  of  a  model  which  accounts  for  both 
correlated  and  independently-occurring  failure  has  been  described  in  an  earlier  report.  Over  the 
past  year,  we  have  completed  work  on  the  first  version  of  a  software  package  which  implements  this 
earlier  rese^u^ch. 

The  package,  written  in  C,  considers  only  processor  failures:  the  failure  of  the  software,  the  I/O 
components,  etc.,  is  not  considered.  Our  future  work  will  extend  it  to  include  these  factors. 

The  package  accepts  as  input  the  number  of  processors,  the  inter-checkpointing  interval,  the 
spontaneous  permanent  and  transient  failure  rates  of  processors,  the  critical  workload  schedule  of 
each  processor,  and  the  characteristics  of  the  operating  environment.  The  latter  is  represented  by 
the  transition  rates  of  the  environment  between  a  variety  of  states.  Each  of  the  environmental 
states  represents  some  degree  of  stress  imposed  by  the  environment  on  the  computer  system,  and 
thus  represents  a  given  increase  in  the  permanent  or  transient  failure  rates.  The  package  allows  the 
system  to  be  heterogeneous,  i.e.,  the  failure  rates  and  susceptibility  to  the  operating  envirotunent 
can  vary  from  processor  to  processor.  The  program  output  is  the  probability  that  the  critical 
workload  is  executed  on  time  throughout  a  specified  interval  of  operation. 

The  features  of  this  package  which  distinguish  it  from  others  is  that  (i)  correlated  failures  and 
(it)  the  periodic  critical  workload  schedule  are  taken  into  account.  It  can  be  used,  for  example,  to 
evaluate  the  performance  of  different  task  schedules  (including  staggered  schedules)  and  the  effects 
of  shielding  against  the  effects  of  the  environment. 

5.2  Scheduling  Tasks  on  Real-Time  Systems  Subject  to  Correlated  Failure 

In  this  part  of  the  work,  we  studied  real-time  systems  operating  in  hostile  environments  [15],  We 
have  obtained  a  heuristic  to  carry  out  scheduling  of  tasks  on  real-time  systems  that  are  subject  to 
correlated  transient  failure.  The  tasks  have  cost  functions  associated  with  them.  That  is,  Ci{t)  is 
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the  cost  of  having  a  response  time  t  for  an  iteration  of  task  i.  These  functions  relate  the  cost  to  the 
controlled  process  of  the  response  time  of  the  various  tasks.  The  objective  is  to  minimize  the  total 
cost  and  maximize  the  available  time  redimdancy  (required  to  defeat  correlated  transient  failure), 
subject  to  the  requirement  that  all  deadlines  be  met.  Since  this  problem  is  clearly  NP-hard,  we 
have  focussed  on  developing  a  heuristic.  The  heuristic  is  in  two  parts. 

At  each  decision  point  (when  a  new  task  becomes  available  or  a  currently-executing  task  com¬ 
pletes),  the  system  has  to  determine  which  of  the  available  tasks  must  be  run.  To  obtain  a  minimum- 
cost  schedule  would  require  us  to  generate  a  search  tree  of  as  many  levels  as  there  eu'e  tasks,  and 
exhaustively  evaluate  it.  This,  however,  is  very  expensive,  and  so  only  a  subset  of  the  entire  set 
of  available  tasks  must  be  considered.  To  determine  which  tasks  are  to  be  in  this  set  (called  the 
lookahead  set),  we  use  the  following  procedure  for  the  first  part  of  the  heuristic. 

From  the  set  of  tasks  that  are  available  for  dispatch  at  decision  time  t  (called  the  current  set, 
C{t)),  choose  the  task,  t,  which  is  most  expensive  to  execute  last  (i.e.,  after  all  the  other  tasks  in 
C{t)).  Find  the  slope  of  the  cost  function  of  task  t  both  at  time  t  as  well  as  at  the  time  when  all 
the  other  tasks  in  C{t)  finish.  Call  these  slopes  Sii  and  Sjj,  respectively.  Include  in  the  lookahead 
set  task  i,  as  well  as  those  tasks  in  C{t)  whose  cost  functions  at  time  t  have  slopes  greater  than 
min{sii,ai2}-  This  lookahead  set  is  then  searched  exhaustively  to  determine  which  task  should  be 
dispatched. 

Note  that  the  above  actions  only  check  the  slope  of  the  cost  function  for  task  t  at  two  time 
instants.  Since  cost  functions  can  be  highly  nonlinear,  this  can  lead  to  emomalies.  The  second 
part  of  the  heuristic,  called  the  compaction  step,  attempts  to  correct  these  anomalies.  Compaction 
starts  with  the  schedule  for  the  entire  task  set,  obtained  as  described  above.  The  last  task  is  deleted 
from  the  task  set,  and  a  new  schedule  is  obtained  for  the  rest  of  the  task  set.  This  last  task  is  then 
inserted  in  the  first  unused  miniframes  during  which  the  task  is  available.  Note  that  the  generation 
of  the  new  schedule  is  done  recursively,  with  the  last  task  in  each  schedule  being  deleted  from  the 
entire  task  set  at  each  point  until  no  tasks  remain,  and  then  the  schedule  is  gradually  built  up  by 
reinserting  the  deleted  tasks  as  specified  above. 

We  have  run  extensive  simulations  to  compstre  the  quality  of  the  schedules  generated  by  this 
algorithm  against  those  of  the  optimal  algorithm  (using  exhaustive  enumeration).  In  the  vast 
majority  of  cases,  our  algorithm  produces  schedules  whose  costs  are  within  5%  of  the  optimal. 

So  far,  we  have  said  nothing  about  maximizing  the  time  redundancy,  i.e.,  the  amount  of  slack 
in  the  schedule  after  all  the  critical  tasks  have  been  inserted.  To  do  this,  we  proceed  as  follows. 
Suppose  there  are  2m  -|-  1  copies  of  eswdi  task  that  have  to  be  nm.  Then,  we  first  schedule  only 
m  -t-  1  copies  of  each  task.  Following  this,  we  add  to  the  schedule  an  (m  -I-  1  -I-  t)'th  copy  of  each 
task,  and  place  it  in  the  schedule  appropriately,  for  t  =  1,2,  ...,m.  In  every  case,  the  placement  of 
the  (m  -t-  fc)'th  copy  of  each  critical  task  is  done  prior  to  the  introduction  into  the  schedule  of  the 
(m  -1-  /)'th  copy  for  l>  k,  and  the  positioning  of  the  (m  ife)'th  copy  of  any  task  is  unaffected  by 
the  positioning  of  the  (m  -|-  /)'th  copy  of  any  other  task  except  insofar  as  may  be  required  to  meet 
the  deadlines  of  all  the  tasks. 

This  results  in  the  tasks  being  staggered  in  such  a  way  that  the  (m  -|-  jfe)'th  copy  of  each  task 
is  completed  before  the  (m  +  f)'th  copy  of  the  same  task  (for  f  >  fc  >  1).  As  a  result,  whenever  a 
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total  of  m  -f  1  identical  outputs  are  produced  by  the  first  m  +  1  correctly  functioning  copies  of  a 
task,  the  rest  of  the  copies  can  be  dispensed  with,  thus  generating  additional  slack  in  the  schedule. 

5.3  Impact  of  Workload  on  the  Reliability  of  Real-Time  Systems 

Most  real-time  systems  employ  N-modular  -  most  commonly  triple-modular  -  redundancy  for  fault- 
tolerance.  When  a  processor  in  a  triad  fails  permanently,  a  spare  processor  (if  avmlable)  must  be 
switched  in  to  take  its  place.  If  the  failure  is  transient,  the  affected  processor  will  be  brought  back 
into  the  triad  after  it  recovers. 

In  either  case,  it  is  necessary  to  make  the  memory  of  all  three  members  of  the  triad  consistent. 
This,  can  be  done  by  copying  into  the  recovering  or  substitute  processor  the  writeable  memory  of 
the  two  processors  that  are  still  functional.  The  time  required  for  this  can  depend  on  the  workload, 
and  the  rate  at  which  this  workload  writes  into  its  memory. 

Until  the  processor  has  recovered,  is  resynchronized  with  its  colleagues  in  the  triad,  and  resumes 
normal  operation,  the  triad  is  effectively  a  duplex  and  will  suffer  fatal  failure  if  one  more  of  its 
processors  fails. 

In  this  work,  we  have  modeled  the  impact  of  workload  on  the  recovery  time,  and  therefore  on 
the  reliability,  of  processor  triads.  We  have  shown  that  there  is  a  knee  above  which  the  allocation 
of  more  tasks  to  processors  increases  the  fatal  failure  rate  dramatically.  Our  current  work  deals 
with  the  implications  of  this  fact  on  the  allocation  of  tasks  to  real-time  systems.  Details  can  be 
found  in  [16], 

5.4  Distributed  Recovery  Algorithms  for  Distributed  Systems 

One  component  in  our  initial  proposal  was  the  development  and  performance  evaluation  of  dis¬ 
tributed  sdgorithms  for  recovering  from  faults  in  a  distributed  real-time  computer  system.  Briefly, 
the  algorithm  that  we  proposed  requires  that  a  node,  henceforth  referred  to  as  the  primary  node, 
transmit  one  or  more  copies  of  a  job  at  the  time  of  its  arrival  to  other  nodes,  referred  to  as  secondary 
nodes,  in  the  system.  The  secondary  nodes  are  responsible  for  monitoring  the  primary  node  for 
failure.  If  a  failiire  occurs  before  the  job  completes,  the  secondary  nodes  select  one  of  them  to  be 
the  new  primary  node  responsible  for  completion  of  the  job. 

Th^re  are  many  interesting  variations  of  this  basic  policy,  and  thus  our  first  task  was  to  develop 
a  simple  analytical  model  which  can  be  used  to  study  the  performance  of  these  variations.  We 
have  chosen  an  approach  whereby  we  decompose  the  system  of  N  nodes  into  N  models,  one  for 
each  node  in  the  system.  The  interactions  between  these  nodes  are  captured  by  the  values  of  the 
input  parameters  of  each  of  these  models.  As  these  parameter  values  are  unknown,  this  yields  a 
fixed  point  problem,  i.e.,  a  set  of  nonlinear  equations  with  these  parameters  as  unknowns.  We  have 
developed  a  detailed  model  for  a  single  node  which  can  be  used  in  such  an  approximate  evaluation; 
our  model  accoimts  for  the  effects  of  node  failures  and  the  communication  costs  of  transferring  job 
copies  from  the  primary  node  to  the  secondary  nodes.  The  details  of  this  model  and  its  analysis 
can  be  found  in  [27]. 
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5.5  Scheduling  Tests  of  Software  for  Real-Time  Systems 

The  reliability  of  real-time  systems  depends  greatly  on  the  reliability  of  the  applications  and  systems 
software  that  is  run  on  it.  Two  approaches  to  reliable  software  have  been  proposed  in  the  literature. 
The  first  approach,  called  the  recovery  block  approach,  deals  with  using  a  primary  version  and  a 
secondly  (or  backup)  version.  There  is  an  acceptance  test  whose  function  is  to  determine  whether 
or  not  the  output  is  likely  to  be  correct.  The  acceptance  test  is  not  perfect:  it  is  assumed  to  have 
a  probability  of  c  <  1  of  detecting  an  erroneous  output,  c  is  called  the  coverage.  If  the  primary  is 
judged  to  have  produced  an  erroneous  result,  the  secondary  is  invoked. 

The  second  approach,  known  as  JV-version  programming,  is  a  software  analogue  of  iV-modular 
redundzuicy.  N  versions  of  the  software  are  independently  produced  and  run  in  parallel.  The  results 
of  the  software  are  voted  as  in  iV-modular  redundancy. 

In  this  work,  we  provide  a  simple  reliability  model  for  N- version  programming  and  the  recovery 
block  scheme  which  can  provide  guidance  for  the  quasi-optimal  adlocation  of  software  debug  time 
among  the  different  versions  [28] . 

We  assume  that  the  software  error  generation  rate  is  a  WeibuU  function  of  the  debug  time  t, 

i.e., 

A(t)  =  Ao 

where  p  ^^nd  a  are  constants  that  characterize  the  software  being  debugged,  and  Aq  is  the  initial 
failure  rate  of  the  software.  The  Weibull  distribution  was  chosen  because  of  its  generality.  Note 
particularly  that  the  popular  exponential  distribution  is  a  special  case  of  the  Weibull. 

We  have  obtained  expressions  for  the  Mean  Time  to  Failure  (MTTF)  of  both  schemes  as  a 
function  of  the  debug  time  of  the  various  modules.  In  particular,  our  results  show  that  when  the 
coverage  of  the  acceptance  test  in  the  recovery-block  approach  is  imperfect  (i.e.,  c  <  1),  most  of  the 
debug  time  should  be  spent  on  the  primary.  Indeed,  the  share  of  the  debug  time  allocated  to  the 
primary  tends  to  increase  as  the  coverage  decreases.  The  expression  for  the  MTTF  of  W-version 
programming  is  too  complex  to  maximize  analytically:  instead  we  show  how  to  use  numerical 
techniques  to  allocate  the  debug  time  optimally. 

5.6  Optimal  Scheduling  of  Signature  Analysis  Tests 

A  second  effort  in  the  area  of  reliability  has  studied  methods  to  optimally  schedule  tests  in  real-time 
systems.  Fault- tolerant  systems  need  to  undertake  regular  and  mutual  testing  to  flush  out  latent 
faults  and  reconfigure  the  system  in  response.  The  use  of  signature  analysis  as  a  testing  procedure 
has  gained  rapidly  in  popularity  over  the  last  few  years.  Signature  analysis  consists  of  applying  a 
sequence  of  test  inputs  to  the  device  imder  test,  and  compressing  the  outputs.  This  compressed 
output  is  then  compared  against  a  reference.  Any  discrepancies  would  indicate  a  faulty  device. 

The  convention  is  to  apply  the  entire  test  sequence  to  the  device  and  then  jmpare  it  against 
the  reference.  However,  if  a  fault  is  uncovered  early,  the  rest  of  the  test  can  be  dispensed  with, 
thus  saving  time.  It  therefore  makes  sense  to  embed  additional  comparisons  against  the  reference, 
i.e.,  to  break  the  sequence  of  tests  down  into  subsequences.  At  the  end  of  each  subsequence,  if 
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any  faults  have  been  uncovered,  the  testing  stops  since  the  device  under  test  is  faulty  and  must  be 
purged;  otherwise,  the  next  subsequence  (if  one  is  available)  is  applied. 

Our  accomplishment  has  been  to  obtain  an  algorithm  which  breaks  the  test  sequence  down 
into  a  set  of  subsequences  so  that  the  expected  testing  time  per  device  is  minimized.  Our  work 
is  thus  likely  to  reduce  the  testing  overhead  in  operational  real-time  systems.  Over  the  past  year, 
we  have  tested  this  algorithm  on  the  benchmark  circuits  of  Brglex,  et  al.,  and  showed  its  practical 
usefulness.  The  inputs  to  our  algorithm  zue:  the  probability  that  the  device  under  test  is  faulty,  the 
coverage  function  of  the  test  sequence,  and  the  overhead  consumed  in  applying  tests  to  comparing 
tlu^  test  outputs  against  the  corresponding  response.  This  work  is  described  in  detail  in  [18].  We 
plan  to  extend  this  work  to  include  board- level  diagnosis.  This  will  involve  incorporating  search 
algorithms  in  our  work  to  locate  a  test  input  out  of  the  whole  sequence  that  exercises  this  fault. 

5.7  Optimizing  Wafer-Probe  Testing 

The  VLSI  chips  that  make  up  highly-reliable  systems  must  be  thoroughly  tested  during  manufacture 
to  ensure  that  defect  levels^  are  suitably  low.  Unfortunately,  exhaustive  testing  is  out  of  the 
question  and  even  the  best  testing  procedure  is  imperfect;  that  is,  failed  chips  can  pass  the  test 
and  be  incorporated  into  a  product. 

Thus,  it  is  always  of  interest  to  (i)  optimize  the  test  effort  required  to  achieve  a  given  defect 
level,  and  (it)  to  achieve  the  lowest  possible  defect  levels  given  the  best  available  (imperfect)  testing 
procedure.  In  this  work,  we  have  developed  a  novel  approach  for  improving  the  effectiveness  of 
wafer-test  procedures  by  obtaining  and  using  yield  estimates  for  individual  dies  on  the  wafer  before 
the  wafer  is  diced. 

Our  approach  is  based  on  the  observation  that  defects  on  a  wafer  are  not  uniformly  distributed, 
but  have  long  been  known  to  exhibit  clustering.  Most  of  the  good  dies  on  a  wafer  are  found  adjacent 
to  other  good  dies,  while  defective  dies  tend  to  be  similarly  clustered.  This  suggests  that  if  we  know 
the  state  of  some  or  all  of  the  neighbors  of  a  given  die,  we  can  obtain  a  better  estimate  of  Hs  yield. 
We  have  shown  how  to  use  this  improved  yield  estimate  to  optimize  the  test  applied  to  the  die.  We 
have  calculated  that  in  typical  cases,  we  can  better  manage  the  test  process  to  obtain,  for  the  same 
testing  time,  a  halving  of  defect  levels,  suid  that  we  can  identify  dies  whose  defect  level  is  about  20 
times  less  than  for  the  overall  lot . 

6  Parallel  Systems 

As  part  of  our  research,  we  designed  and  evaluated  the  performance  of  scheduling  policies  for 
parallel  systems  executing  jobs  with  real-time  constraints.  We  have  focused  on  two  aspects  of  this 
problem; 

•  The  analysis  of  priority  scheduling  policies  for  multiprocessors  executing  parallel  applications. 

•  The  determination  of  optimal  scheduling  policies  for  (both  real-time  and  non-real-time)  par¬ 
allel  processing  systems  executing  parallel  applications. 

‘The  defect  level  is  the  ratio  of  bad  chips  which  pass  the  test  to  all  the  chips  that  pass  the  test. 
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6.1  Priority  Policies  for  Multiprocessors 

We  have  developed  simple  models  for  a  multiprocessor  which  executes  a  stream  nf  K  classes  nf 
jobs,  each  of  which  consists  of  a  random  number  of  tasks  that  can  be  executed  independently  of 
each  other;  we  refer  to  such  jobs  as  a  fork-join  jobs.  Several  priority  scheduling  poUcies  have  been 
analyzed:  a)  a  strict  non-preemptive  head  of  the  line  policy,  b)  a  preemptive  policy  that  allows 
preemptions  at  the  job  level,  c)  a  preemptive  policy  that  allows  preemptions  at  the  task  level,  and 
d)  a  policy  where  the  priority  is  a  non- decreasing  function  of  the  number  of  tasks  in  the  queue  with 
preemptions  at  the  job  level.  Using  these  models,  we  have  compared  the  mean  job  response  time 
for  the  different  classes  under  the  various  scheduling  policies  and  imder  FCFS  scheduling.  We  have 
also  compared  the  performance  of  these  policies  to  that  of  a  system  in  which  the  processors  are 
partitioned  so  that  classes  are  allocated  only  to  certain  processor  groups.  Our  results  have  shown 
that  for  the  system  considered,  the  task  preemption  policy  has  a  uniformly  better  class  response 
time  and  thus  is  preferable  to  a  system  with  partitioned  processors.  Details  of  this  study  will  be 
available  in  a  forthcoming  report  [22] . 

We  are  also  now  completing  a  related  study  of  the  behavior  of  the  first-come-first-serve  schedul¬ 
ing  policy  for  fork-join  jobs  on  a  multiprocessor.  Our  results  here  have  included  a  characterization 
of  the  response  time  distribution  under  Markovian  assumptions,  development  of  computationally 
efficient  upper  and  lower  bounds  for  the  moments  of  the  response  time,  a  proof  that  FCFS  is  the 
policy  that  minimizes  (and  last-come-first-serve  (LCFS)  maximizes)  the  expected  value  of  a  convex 
function  of  the  response  time.  This  last  property  has  the  imphcation  that  FCFS  minimizes  (and 
LCFS  maximizes)  any  moment  of  the  response  time  distribution.  From  a  practical  standpoint,  this 
means  that  LCFS  maximizes  the  fraction  of  jobs  that  complete  within  their  deadline  in  a  real-time 
system  in  which  all  jobs  eventually  receive  service  (whether  or  not  they  miss  their  deadlines)  under 
a  general  set  of  assumptions.  Details  of  this  study  can  be  found  in  [21].  We  note  once  again  that 
we  believe  this  is  a  strong  result  as  it  establishes  the  optimality  of  LCFS  over  a  broad  range  of 
possible  schedviling  disciplines. 

0.2  Optimality  Results 

We  also  completed  a  study  of  the  effects  of  scheduling  disciplines  on  the  performance  of  pswallel 
systems  (both  with  and  without  real-time  constraints).  A  job  is  composed  of  a  set  of  tasks,  with  a 
partial  order  specifying  the  precedence  constraints  between  the  tasks.  We  assume  that  a  predefined 
mapping  of  the  tasks  to  processors  has  been  given  and  that  the  processors  execute  a  stream  of 
jobs,  aJl  with  the  same  task  graph  and  task/processor  allocation.  Our  goal  is  to  study  the  effects  of 
different  local  scheduling  policies  at  each  of  the  processors  on  the  job  throughput,  number  of  jobs 
in  the  system,  and  the  job  response  time. 

We  have  been  able  to  establish  several  important  results.  First,  we  have  been  able  to  show 
that  the  FCFS  policy  applied  at  the  task  level  minimizes  the  number  of  jobs  in  the  system  and 
maximizes  the  throughput.  Second,  we  have  also  established  that  FCFS  minimizes  the  expected 
value  of  any  convex  function  of  the  response  time.  In  the  case  that  jobs  have  soft  real-time  deadlines 
and  are  completed  regardless  of  whether  they  make  their  deadlines,  we  have  shown  that  the  earliest 
deadline  policy  (ED)  minimizes  the  expected  value  of  any  increasing  convex  function  of  the  lag  time, 
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where  the  lag  time  is  defined  as  the  difference  between  a  job’s  deadline  and  its  actual  completion 
time.  The  latest  deadline  policy  (LD)  has  also  been  shown  to  maximize  this  measure.  Finally,  we 
have  been  able  to  establish  that  the  LCFS  scheduling  policy  maximizes  the  fraction  of  jobs  that 
complete  by  their  deadlines  among  the  class  of  policies  that  do  not  use  service  time  or  deadline 
information.  Details  of  this  research  may  be  found  in  [1]. 
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